Setup and Discription of Dataset

Bivalent level traits and metrics have been added in the src/Setup_BivData.Rmd script. These observations are from the automated image analysis algorithm and have been curated (threw out incorrect algorithm output). The MLH1 data file is also loaded into this file.

The breakdown of curated bivalents by category

Chrm Class proportions

The figure below shows the average percentages of chromosome classes per category.

These results align with the previous figure I made using percentages from hand measured results (dozens of observations, instead of hundreds). These results fit with the rapid male evolution of gwRR in MSM and PWD.

  • Look into ordering the mouse strains to highlight the MSM and PWD differences, renaming the chrm type labels, spacing out the x axis labels or removing some of the other mice.

Chromosome Size Effect

In an attempt to test for and mitigate noise based on chromosome identity/physical size, I tried to isolate bivalents which are in the top quartille for SC length from their cells.

Since the majority of cells from the Automated BivData set do not have the complete complement of bivalents this subset is much smaller and might have ‘smaller’ bivalents from cells missing the longer bivalents. But assuming the algorithm’s identification of bivalents in random, with a large enough sample size, hopefully this error would be minimized.

Below are examples of plots which show the SC length distributions across cells. The top figure shows whole cell hand measured data and the bottom shows the Automated bivData from cells with at least 15 bivalents measured. There are plot pairs for each category, but they are excluded for space.

Each point is a bivalent plotted by cell on the x axis. X’s are the 4th quartile, big point is the mean and smaller black point is the median. I’m using these to compare the patterns of these statistics in the automated data set which is missing some bivalent data.

## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 9 rows containing missing values (geom_point).

## Warning: Removed 9 rows containing missing values (geom_point).

## Warning: Removed 9 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 41 rows containing missing values (geom_point).

## Warning: Removed 41 rows containing missing values (geom_point).

## Warning: Removed 41 rows containing missing values (geom_point).

All the plots above show the distributions of manually whole cell measured SC lengths compared to the SC length distributions from the automated bivalent data. It shows the amount of within cell variance across strains. There is a bit of variance across the SC length distributions in the PWD females.

I’m purposing to take all the automated biv measures above the cell’s 4th quartille as a data set for ‘longest’ bivalents which we would predict to exclude the shortest chromsomes which would have stronger chromosome size effects.

But this data set might be noisy, given the amount of variance in the SC length distributions across cells (PWD females, WSB females). But it’s easier to just run another set of analysis.

The DF real.long.bivData contains 680 bivalent measures. full data set is 10202. This is the breakdown of bivalent observations by category

.

Still very unbalanced across sexes and strains ~ merge with the manual whole cell biv measures later… Musc males are very under-represented. MSM male only has data from 1 cell.

  • Try to merge this DF with the whole.cell manual measures.

  • Try estimating which ‘loose’ bivalent observations might be within the long class of bivalents.

in code chunk above I ran the mouse averages for the longest bivalets. (680 bivalents, from 54 mice. 10202 bivalents from 86 mice.

## Warning: Removed 3 rows containing missing values (geom_point).

These plots show the SC lengths for the ‘long SC data set’. They are supposed to be the longest 4-5 SC from cells where I could get good measures. These longer bivalents are useful because their patterns shouldn’t be affect by chromosome size effect (which effects, CO position). Hopefully this data set will have less noise from chromosome identity, but there was still data missing (they don’t come from whole cell measures).

Adjusting for XX

The female mouse averages should have adjustments for the XX. working on code to estimate the SC length from 3rd largest bivalent from female whole cell data across strains. Subtract this amount from the female mouse averages … This isn’t the best solution – since I can’t determine what proportion of cells for female mouse averages include the XX, (most cells are missing at least 3 bivalents)

  • Of all female single bivalents observations, 5% are XX (1 of 20).

  • The XX is large, likely within the top 25% longest bivalents of the cell (3rd largest by Mb).

  • The average % of XX for whole cell SC (sum(all bivalents)) can be calculated from the whole.cell data set. Lets guess 12% of a cell’s total SC area is XX.

I think a formula something like this can be applied to adjust for XX

  • rate of bivalent segmentation /* rate of XX, 5% /* mean SC length for 3rd longest bivalent / total SC area (by bivalent) = proportion of SC araea due to XX

Applying Mix Model framework

I will use two different mixed models using lme() and lmer(), because the different models have limitations for reporting p values for fixed and random effects.

Pros and con list for using mouse averages vs single bivalent traits.

Pro Bivalent level

  • The data are continuous and closer to normal/Gaussian expectations than the MLH1 counts.

  • There is within cell variance that is an issue.

Pro Mouse level averages

  • Mouse average is a conserved framework, it would summarize general patterns which fits with the paper.

  • Simple application of the same MM.

  • Limited work to add to the data set if I can justify that the collection of single bivalents are random and equivalent across mice.

The mouse averages for all the metrics is what is used in the mixed models.

In the chunk above the mouse averages table is made – may need to add all the extra metrics (IFD, .

Predictions, Heterochiasmy

Using the Mixed model framework which tests the effects (and interactions) of subspecies, sex and strain, I will test for evolution of the following traits.

Two bivalent level traits are predicted to display heterochiasmy (ie significant effects of sex);

  1. SC length will be sexually dimorphic (sex effect will be significant)(cite Lynn)

2.A) Normalized 1CO positions will be sexually dimorphic (sedell and Kirkpatrick).

  1. sister cohesion tension (sis-co-ten) will be sexually dimorphic as it reflects the general property of uniform vs telomere/biased CO positioning.

  2. Centromere and telomere distances will be sexually dimorphic.

  1. Interference / IFD will not be sexually dimorphic. Previous measures has been shown to lack sexual dimorphism (cite petkov).

Predictions, Male Polymorphism (High vs Low Rec strains)

  1. SC lengths will be longer for high Rec strains.

B.1) Interfernce/IFD will be shorter in high Rec strains.

C.1) It’s unknown if 1CO normalized positions between high and low musc strains, will differ. (no good predictions come to mind)

C.2) sis-co-ten metric will be maximized in higher recombining strains .. because.

C.3) telomere and centromere distances will be shorter in low rec strains

Using the same MM framework we would expect the random strain effect would be significant for these predictions.

Use just the male (Musc male data)

\[mouse \ average \ SC \ length ~=~ rand(strain) + \varepsilon \]

\[mouse \ average \ IFD \ trait ~=~ rand(strain) + \varepsilon \] The IFD traits are:

  • IFD_ABS

  • IFD_PER

\[mouse \ average \ CO \ position \ trait ~=~ rand(strain) + \varepsilon \]

The CO position traits are:

  • 1CO position

  • centromere and telomere distance

  • sis-co-ten

SC Lengths

## Warning: Removed 29 rows containing non-finite values (stat_boxplot).
## Warning: Removed 30 rows containing missing values (geom_point).

First draft of the SC length plot. I’d like to add the hand foci count to the x axis. (also add the PWD - male data)

TRY ADDING NESTED box plots for hand.foci.count

Mixed Model Tests

\[mouse \ average \ SC \ length ~=~ subsp * sex + rand(strain*sex) + \varepsilon \]

To estimate random effects, I’m using lmer (not lme where I could do the rand(strain*sex)). So now it’s just random(strain).

The evolution will be tested with the subsp fixed effect and sexual dimorphism of the trait will be tested with the sex and interaction effects.

remember for the MM use the table mouse.avs_4MM remember the long biv data set in real.long.bivData

I used the mixed model with the lme() for the full Curated BivData and long bivdata

\[mouse \ average \ SC \ length ~=~ subsp * sex + rand(strain) + \varepsilon \]

Heterochiasmy Prediction

Is sex a significant effect for SC length? (as predicted)

  • Sex is a significant effect for SC length The results seem to indicate that sex is a significant factor. Consider writing a sub sampling approach (randomize / permute a data set of BivData)

  • According to anova, sex effect explains most of the variance in single bivalent SC lengths.

  • The Long Biv Data set largely agrees with the full curated dataset

  • I caveat I haven’t addressed yet, is the XX in the female Biv data averages —

Is the random Effect of strain an effect on SC length?

The exactRLRT() test indicates that the random strain effect might be significant, p= xx

Male Polymorphism Prediction

Prediction, High rec males have longer SC.

## 
## Call:
## glm(formula = Rec.group ~ mean_SC, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$sex == "male") & (mouse.avs_4MM$subsp == 
##         "Musc"), ])
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.41174  -0.39326  -0.15951   0.09663   2.29849  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -72.2866    31.3071  -2.309   0.0209 *
## mean_SC       0.8700     0.3802   2.288   0.0221 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30.789  on 22  degrees of freedom
## Residual deviance: 12.171  on 21  degrees of freedom
## AIC: 16.171
## 
## Number of Fisher Scoring iterations: 7
## 
## Call:
## glm(formula = Rec.group ~ mean.SC_1CO, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$sex == "male") & (mouse.avs_4MM$subsp == 
##         "Musc"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3268  -1.0086  -0.6663   1.1416   1.8410  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -13.3861     9.9681  -1.343    0.179
## mean.SC_1CO   0.1683     0.1290   1.305    0.192
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30.789  on 22  degrees of freedom
## Residual deviance: 28.847  on 21  degrees of freedom
## AIC: 32.847
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ mean.SC_2CO, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$sex == "male") & (mouse.avs_4MM$subsp == 
##         "Musc"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7948  -0.8077  -0.6259   0.9636   1.7535  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -23.4238    11.4114  -2.053   0.0401 *
## mean.SC_2CO   0.2439     0.1210   2.015   0.0439 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30.789  on 22  degrees of freedom
## Residual deviance: 25.145  on 21  degrees of freedom
## AIC: 29.145
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ mean.SC_3CO, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$sex == "male") & (mouse.avs_4MM$subsp == 
##         "Musc"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1803  -1.0086  -0.8022   1.1610   1.8038  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.61764    6.16006  -1.074    0.283
## mean.SC_3CO  0.06432    0.06253   1.029    0.304
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 19.121  on 13  degrees of freedom
## Residual deviance: 17.754  on 12  degrees of freedom
##   (9 observations deleted due to missingness)
## AIC: 21.754
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2550  -0.8769  -0.7660   1.3523   1.8586  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -2.332885   0.182192 -12.805   <2e-16 ***
## chromosomeLength  0.018213   0.002096   8.689   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3597.0  on 2914  degrees of freedom
## Residual deviance: 3519.2  on 2913  degrees of freedom
## AIC: 3523.2
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male") & (Curated_BivData$hand.foci.count == 
##         1), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8365  -0.7642  -0.7437  -0.7059   1.7278  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -1.410168   0.232913  -6.054 1.41e-09 ***
## chromosomeLength  0.003913   0.002915   1.343    0.179    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1995.3  on 1779  degrees of freedom
## Residual deviance: 1993.5  on 1778  degrees of freedom
##   (236 observations deleted due to missingness)
## AIC: 1997.5
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male") & (Curated_BivData$hand.foci.count == 
##         2), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4643  -1.1234  -0.9415   1.2004   1.5128  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -1.858381   0.427168  -4.350 1.36e-05 ***
## chromosomeLength  0.018603   0.004459   4.172 3.01e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1142.9  on 825  degrees of freedom
## Residual deviance: 1125.0  on 824  degrees of freedom
##   (236 observations deleted due to missingness)
## AIC: 1129
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male") & (Curated_BivData$hand.foci.count == 
##         3), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1792  -1.0765  -0.9714   1.2452   1.5310  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)      -1.41431    2.06647  -0.684    0.494
## chromosomeLength  0.01202    0.02099   0.573    0.567
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.860  on 31  degrees of freedom
## Residual deviance: 43.526  on 30  degrees of freedom
##   (236 observations deleted due to missingness)
## AIC: 47.526
## 
## Number of Fisher Scoring iterations: 4

The mean SC logistic regresion model for mouse averages

-72.2866086, 0.8700302

and for the single bivalent levels

-2.3328849, 0.0182132

t.test for SC Length

The mouse average and single bivalents are significantly longer in the high rec group p = 0.0015668 and p=1.459908210^{-17}.

I can’t get the above mixed models to work for testing high vs low rec groups

#hold off on reporting/ working with the long SC biv resutls


long.SC.mouse.avs_4MM_male <- Long_biv_mouse.avs_4MM[Long_biv_mouse.avs_4MM$sex == "male",]
long.SC.mouse.avs_4MM_Mmale <- long.SC.mouse.avs_4MM_male[long.SC.mouse.avs_4MM_male$subsp == "Musc",]

#divide musc males into groups
long.SC.mouse.avs_4MM_male$Rec.group <- ifelse(grepl("PWD male", long.SC.mouse.avs_4MM_male$category), 1, 
                                    ifelse(grepl("MSM male", long.SC.mouse.avs_4MM_male$category), 1, 0))

long.biv.log.reg <- glm(Rec.group ~ mean_SC, 
                            data=long.SC.mouse.avs_4MM_male, family=binomial(link="logit"))

summary(long.biv.log.reg)#NS effect (likelt underpowered)

When all male mice are used, the predictive power is greater, than when just the Musc strains are used. When, just the Musc strain are used, The mouse mean SC is slightly significant in predicting if a mouse is in the high or low (should I consider running on female too?)

Is the prediction, high rec musc male strains have long SC met?

In a logistic regression, mouse average SC length is slightly predictive telling if a mouse is in a high or low Rec strain. I couldn’t get the Mixed models working for the male polymorphism predictions…

IFD

ToDo: make a melt.DF with the IFD1, IFD2, IFD3 in a single cols

explain the difference between the raw and PER measures. Why should I use /examine the IFD.PER?

  1. Interference / IFD will not be sexually dimorphic.

B.1) Interfernce/IFD will be shorter in high Rec strains to allow more foci to fit on a single bivalent.

Interfocal Distance, a indication of CO interference.

The raw measures are pretty nosy, this might not be the best way to display these results.

Each point is a IFD observation. Maybe add mouse level lines for the means.

## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

Fixed Effects

Mixed model analysis for IFD (interference), the first set of models are made with the lme() functions.

\[mouse \ average \ IFD ~=~ subsp * sex + rand(strain*sex) + \varepsilon \]

Sex is a slightly significant factor for raw and normalized IFD.

Random Effects

These second set of models are made using the lmer()

\[mouse \ average \ IFD ~=~ subsp * sex + rand(strain) + \varepsilon \]

\[mouse \ average \ normalized IFD ~=~ subsp * sex + rand(strain) + \varepsilon \]

I tested 2 versions of the mixed model for this flavor of trait, raw IFD and normalized IFD measure. The tables below are from anova( for the lmer model )

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 0, p-value = 1
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 1.9997, p-value = 0.0548

Random effect of strain is not significant for ABS IFD, and only slightly significant for the IFD.PER

IFD 3CO bivalents

Run comparisons for 3CO bivalents.

Heterochiasmy Prediction

Do the IFD measures lack significant sex effect? (as predicted)

There are sex effects for the lme() models for both the ABS and PER IFD measures. There is slight evidence for significant strain effect.

Are there signiticant strain effects?

The random strain effect was tested for raw and normalized IFD, neither are significant.

Male Polymorphism Prediction

The polymorphism frame-work

prediction A, Is the Male polymorphism Prediction met? High rec strains have shorter IFDs?

logistic regression \[ Rec \ Group ~=~ mouse \ average \ IFD \] For these tests I am looking at the single IFD measure from 2CO bivalents.

## 
## Call:
## glm(formula = Rec.group ~ mean_IFD.2CO_ABS, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$sex == "male") & (mouse.avs_4MM$subsp == 
##         "Musc"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5101  -0.9963  -0.7545   1.2282   1.7485  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)      -5.64222    4.13508  -1.364    0.172
## mean_IFD.2CO_ABS  0.09329    0.07326   1.273    0.203
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30.789  on 22  degrees of freedom
## Residual deviance: 29.006  on 21  degrees of freedom
## AIC: 33.006
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ mean_IFD.2CO_PER, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$sex == "male") & (mouse.avs_4MM$subsp == 
##         "Musc"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1070  -1.0202  -0.9232   1.3397   1.4669  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)        -2.141      4.774  -0.448    0.654
## mean_IFD.2CO_PER    2.891      8.080   0.358    0.720
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30.789  on 22  degrees of freedom
## Residual deviance: 30.660  on 21  degrees of freedom
## AIC: 34.66
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ IFD1_ABS, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$sex == "male") & 
##         (Curated_BivData$subsp == "Musc"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4620  -1.1260  -0.9415   1.2032   1.5650  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.953625   0.229122  -4.162 3.15e-05 ***
## IFD1_ABS     0.015399   0.003958   3.891 9.98e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1182.9  on 854  degrees of freedom
## Residual deviance: 1167.3  on 853  degrees of freedom
##   (2060 observations deleted due to missingness)
## AIC: 1171.3
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ IFD1_PER, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$sex == "male") & 
##         (Curated_BivData$subsp == "Musc"), ])
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.266  -1.138  -1.015   1.211   1.436  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.6342     0.2721  -2.330   0.0198 *
## IFD1_PER      0.9096     0.4523   2.011   0.0443 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1182.9  on 854  degrees of freedom
## Residual deviance: 1178.8  on 853  degrees of freedom
##   (2060 observations deleted due to missingness)
## AIC: 1182.8
## 
## Number of Fisher Scoring iterations: 4
## [1] 0.0001403108
## [1] 0.08244956
## Warning: Removed 2066 rows containing non-finite values (stat_boxplot).
## Warning: Removed 2066 rows containing missing values (geom_point).

## Warning: Removed 2061 rows containing non-finite values (stat_boxplot).
## Warning: Removed 2061 rows containing missing values (geom_point).

Re-order the above figure if possible.

Neither t.test are sig for both the ABS and PER when I test just the Musc strains. The above t.tests are breaking the knitr

The t.tests for IFD1s at the bivalent level for the high and low musc males are significant.

None of the logistic regression models for ABS or PER IFD lengths are significant, even when just the Musc strains are used.

Post-hoc comparisons ideas?

Preliminary results from an independent data set indicated that PWD had longer IFDs, which goes against the simple prediction of more COs ~ denser spacing of foci on the same bivalent. This also indicated that interference distance may evolved in the house mouse complex.

Prediction B, Is there evidence for the alternative IFD meansure

General CO Positions

There are a few traits that fall within the CO positions

  • 1CO normalized position

  • sis.co.ten (sorta interference)

  • centromere and telomere distance

Predictions

Heterochiasmy

2.A) Normalized 1CO positions will be sexually dimorphic (sedell and Kirkpatrick).

  1. sister cohesion tension (sis-co-ten) will be sexually dimorphic as it reflects the general property of unifrom vs telomere/biased CO positioning.

  2. Centromere and telomere distances will be sexually dimorphic.

Polymorphism

C.1) It’s unknown if 1CO normalized positions between high and low musc strains, will differ. (no good predictions come to mind)

C.2) sis-co-ten metric will be maximized in higher recombining strains .. because.

C.3) telomere and centromere distances will be shorter in low rec strains

Normalized 1CO positions

Above plot focuses on the 1CO bivalent normalized positions since CO interference controls the general position of COs when there are multiple COs. This plot shows the sexual dimorphism in the density plots.

Consider adding annotate_text for the number of observations in each category. think about adding a vertical line for centromere, for the position means. Think about removing the extra Musc strains.

These box plot show that females have a much more medial position of single foci bivalents, (much closer to 50% compared to males). They also show that Musc males’ Foci1 position is slightly more central / medial compared to the same type of positions in the Dom male strains. MOLF males have much more medial positions than other strains.

Mixed models norm 1CO positions

the distribution of SC lengths and sis-coten seems very different across sexes

\[mouse \ average \ F1 position ~=~ subsp * sex + rand(strain) + \varepsilon \]

The mixed model data should only come from 1CO bivalent data.

## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 14.982, p-value < 2.2e-16
## 
## Call:
## glm(formula = Rec.group ~ norm.first.foci.pos, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$subsp == "Musc") & (mouse.avs_4MM$sex == 
##         "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4905  -1.0443  -0.7194   1.2530   1.4731  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)           -7.760      6.626  -1.171    0.242
## norm.first.foci.pos   11.125      9.936   1.120    0.263
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 29.767  on 21  degrees of freedom
## Residual deviance: 28.365  on 20  degrees of freedom
## AIC: 32.365
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ PER_Foci_1, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male") & (Curated_BivData$hand.foci.count == 
##         1), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8105  -0.7667  -0.7476  -0.7162   1.7278  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.2622     0.1773  -7.120 1.08e-12 ***
## PER_Foci_1    0.2351     0.2541   0.925    0.355    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1994.7  on 1778  degrees of freedom
## Residual deviance: 1993.9  on 1777  degrees of freedom
##   (237 observations deleted due to missingness)
## AIC: 1997.9
## 
## Number of Fisher Scoring iterations: 4

the mouse average foci1 pos is more sig in t.test, but not log regression… (is something wrong?) Check the mouse averages for the F1_pos, there might be an outlier or mouse with v.few observations.

Siscoten

The metric Sis-co-ten measures the amount of sister cohesion connected to the other pole.

The logic of how the sis-co-ten metric is outlined in the figure below. The goal is to use this metric to model different tension active cohesion amounts as a consequence of different numbers and placements of chiasmata/CO. This metric is calculated using SC area as a proxy to the amount of cohesion at metaphase.

from (Lee, J. (2019). Is age-related increase of chromosome segregation errors in mammalian oocytes caused by cohesin deterioration?. Reproductive Medicine and Biology.)

from (Lee, J. (2019). Is age-related increase of chromosome segregation errors in mammalian oocytes caused by cohesin deterioration?. Reproductive Medicine and Biology.)

## Warning: Removed 135 rows containing missing values (geom_point).

## Warning: Removed 46 rows containing missing values (geom_point).

## Warning: Removed 58 rows containing missing values (geom_point).

## Warning: Removed 32 rows containing missing values (geom_point).

## Warning: Removed 60 rows containing missing values (geom_point).

## Warning: Removed 44 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 11 rows containing missing values (geom_point).

## Warning: Removed 13 rows containing missing values (geom_point).

Males have much clearer separation of siscoten across chrm classes. This is emphasized when SC length is also plotted. It seems like musc males have higher amounts of this metric compared to Dom males (which might)

To formally test the differences in sis-co-ten I plan to write a sub sampling / permutation loop to compare the mean(sis.co.ten) of the same numbers of bivalents of the same class.

## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 5.2467, p-value = 0.0078

The fixed effects, sex and sex*subsp are significant. The random strain effect is also significant.

Is the heterochiasmy prediction met?

Yes, model predicting the mouse average siscoten, sex and sex-subp interaction are significant factors. The Random strain effect is also significant.

## 
## Call:
## glm(formula = Rec.group ~ mean.siscoten, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$subsp == "Musc") & (mouse.avs_4MM$sex == 
##         "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6213  -0.7112  -0.2700   0.3972   1.9960  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   -14.1864     6.1993  -2.288   0.0221 *
## mean.siscoten   0.3693     0.1664   2.220   0.0265 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 29.767  on 21  degrees of freedom
## Residual deviance: 18.769  on 20  degrees of freedom
## AIC: 22.769
## 
## Number of Fisher Scoring iterations: 5
## 
## Call:
## glm(formula = Rec.group ~ SisCoTen, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3297  -0.9008  -0.7578   1.3452   1.7477  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.312692   0.082074 -15.994   <2e-16 ***
## SisCoTen     0.015126   0.001818   8.319   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3308.5  on 2633  degrees of freedom
## Residual deviance: 3238.0  on 2632  degrees of freedom
##   (44 observations deleted due to missingness)
## AIC: 3242
## 
## Number of Fisher Scoring iterations: 4

All the sis.co.ten tests are highly significant. Maybe I should consider running a normalized sis.co.ten? I think nrm_siscoten would still reflect the differing cohesion struction/outcome.

Telomere and centromere Distance

My metric for telomere and centromere distance measure the distance of the nearest foci to the ends of the bivalent (SC). In the plots below each point is a single bivalent. I choose not to use the mark for centromere because it seems noisy and inconsistent…

## Warning: Removed 81 rows containing missing values (geom_point).

## Warning: Removed 78 rows containing missing values (geom_point).

Males on average have much lower raw telomere distance (reflects the telomere bias) compared to females. In Males, 2CO bivalents have very low telomere distances, while the 1CO bivalents have a greater range. In females the ranges of telomere distances have much more overlap.

## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 7.3832, p-value = 0.0018

Mixed model result summary:

## 
## Call:
## glm(formula = Rec.group ~ mean.telo.dist, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$subsp == "Musc") & (mouse.avs_4MM$sex == 
##         "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9683  -0.9378  -0.4362   1.0330   1.5370  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)      5.3353     3.2086   1.663   0.0963 .
## mean.telo.dist  -0.2962     0.1679  -1.764   0.0778 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 29.767  on 21  degrees of freedom
## Residual deviance: 25.245  on 20  degrees of freedom
## AIC: 29.245
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ telo_dist, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0309  -0.9074  -0.8371   1.4522   1.8995  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.558268   0.064237  -8.691  < 2e-16 ***
## telo_dist   -0.009689   0.002381  -4.069 4.72e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3343.9  on 2673  degrees of freedom
## Residual deviance: 3326.6  on 2672  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 3330.6
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ telo_dist_PER, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1736  -0.9228  -0.8017   1.4166   1.8688  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.44260    0.06617  -6.689 2.25e-11 ***
## telo_dist_PER -1.23905    0.20677  -5.992 2.07e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3343.9  on 2673  degrees of freedom
## Residual deviance: 3306.0  on 2672  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 3310
## 
## Number of Fisher Scoring iterations: 4
## Warning: Removed 122 rows containing missing values (geom_point).

## Warning: Removed 123 rows containing missing values (geom_point).

## boundary (singular) fit: see ?isSingular
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 0, p-value = 1
## 
## Call:
## glm(formula = Rec.group ~ mean.cent.dist, family = binomial(link = "logit"), 
##     data = mouse.avs_4MM[(mouse.avs_4MM$subsp == "Musc") & (mouse.avs_4MM$sex == 
##         "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6243  -0.8330  -0.1981   0.7300   1.8471  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)     16.6457     7.7957   2.135   0.0327 *
## mean.cent.dist  -0.4686     0.2169  -2.161   0.0307 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 29.767  on 21  degrees of freedom
## Residual deviance: 20.589  on 20  degrees of freedom
## AIC: 24.589
## 
## Number of Fisher Scoring iterations: 5
## 
## Call:
## glm(formula = Rec.group ~ dis.cent, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0718  -0.9167  -0.8091   1.3966   1.9594  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.299287   0.085293  -3.509  0.00045 ***
## dis.cent    -0.012112   0.002067  -5.860 4.64e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3312.4  on 2636  degrees of freedom
## Residual deviance: 3276.8  on 2635  degrees of freedom
##   (41 observations deleted due to missingness)
## AIC: 3280.8
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ dis.cent.PER, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1539  -0.9239  -0.7551   1.3347   1.8129  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.12100    0.08579  -1.411    0.158    
## dis.cent.PER -1.35405    0.16746  -8.086 6.17e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3312.4  on 2636  degrees of freedom
## Residual deviance: 3244.9  on 2635  degrees of freedom
##   (41 observations deleted due to missingness)
## AIC: 3248.9
## 
## Number of Fisher Scoring iterations: 4

The normalized centromere plots show that in Musc males, on 2CO bivalents the 1st CO is closer to the centromere end than in Dom males.

Females have more overlap in the distributions of centromere distances across chromosome class compared to males.

Heterochiasmy Prediction

Is sex a significant effect for the 1CO normalized CO position? (as predicted)

is random strtain effect significant?

The random strain effect seems very significant.

remember to use the mouse average table mouse.avs_4MM (I don’t think I need the MELT data frame)

#make point plots / boxplots which show differences in mean positions 

#scatter + boxplot for t.tests

Dr. Broman suggested that the Smirnov K /(curve comparison) wasn’t the best test to differences in general CO position. He suggested doing simple t-test for the positions

#try remaking the plot Megan suggested
# for 2CO positions, Foci1, Position  on x and Foci 2 position on y

CurBivData_2CO <- Curated_BivData[Curated_BivData$hand.foci.count == 2,]

CurBivData_2CO <- CurBivData_2CO[!(is.na(CurBivData_2CO$Foci2) | CurBivData_2CO$Foci2==""), ]

#isolate 2COs
#facet by sex and subsp

F1.x.F2 <- ggplot(CurBivData_2CO, aes(x=Foci1,y=Foci2, color=strain) ) + geom_point()+ facet_wrap(~sex)+ggtitle("test plot")
F1.x.F2

#what is the pattern of variance
#run analyses for each subsp*sex
#use non-melt DF

#how is the variance partioned across
#cell, mouse, strain

female.Dom <- Curated_BivData[Curated_BivData$sex == "female",]
female.Dom <- female.Dom[female.Dom$subsp == "Dom",]

female.Dom$Foc1.PER <- female.Dom$Foci1 / female.Dom$chromosomeLength

#unorder strain and mouse

female.Dom$mouse <- as.factor(female.Dom$mouse)


female.Dom$strain <- unclass(female.Dom$strain)
female.Dom$strain <- as.factor(female.Dom$strain)

female.Dom_1CO <- female.Dom[female.Dom$hand.foci.count == 1,]
female.Dom_1CO <- female.Dom_1CO[(!is.na(female.Dom_1CO$hand.foci.count)),]

#1CO first
modo <- lm(Foc1.PER ~ fileName + mouse + strain, data=female.Dom_1CO)

#can't get mouse and strain to have sum of square
#residual size decreases with per.F1
#residuals much larger than fileName, mouse and strain no 

#model <- lm(breaks ~ wool * tension, 
#            data = warpbreaks, 
#            contrasts = list(wool = "contr.sum", tension = "contr.poly"))

male.Dom <- Curated_BivData[Curated_BivData$sex == "male",]
male.Dom <- male.Dom[male.Dom$subsp == "Dom",]

male.Dom$mouse <- as.factor(male.Dom$mouse)

male.Dom$strain <- unclass(male.Dom$strain)
male.Dom$strain <- as.factor(male.Dom$strain)

male.Dom <- male.Dom[male.Dom$hand.foci.count == 1,]
male.Dom <- male.Dom[(!is.na(male.Dom$hand.foci.count)),]

male.Dom$Foc1.PER <- male.Dom$Foci1 / male.Dom$chromosomeLength



male.modo <- lm(Foc1.PER ~  fileName | mouse | strain, data=male.Dom)
summary(aov(male.modo))

#only file name is registering as effect
#Review ANOVA frameworks
#http://www.biostathandbook.com/nestedanova.html

Male Polymorphism Prediction

I didn’t have a good prediction for the male polymorphism… So I’ll just do post-hoc comparisons for the groups to test if they are different.

#CO_pos